Goto

Collaborating Authors

 temporal difference learning algorithm


Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

Neural Information Processing Systems

In this paper, we provide a unified analysis of temporal difference learning algorithms with linear function approximators by exploiting their connections to Markov jump linear systems (MJLS). We tailor the MJLS theory developed in the control community to characterize the exact behaviors of the first and second order moments of a large family of temporal difference learning algorithms. For both the IID and Markov noise cases, we show that the evolution of some augmented versions of the mean and covariance matrix of the TD estimation error exactly follows the trajectory of a deterministic linear time-invariant (LTI) dynamical system. Applying the well-known LTI system theory, we obtain closed-form expressions for the mean and covariance matrix of the TD estimation error at any time step. We provide a tight matrix spectral radius condition to guarantee the convergence of the covariance matrix of the TD estimation error, and perform a perturbation analysis to characterize the dependence of the TD behaviors on learning rate. For the IID case, we provide an exact formula characterizing how the mean and covariance matrix of the TD estimation error converge to the steady state values at a linear rate. For the Markov case, we use our formulas to explain how the behaviors of TD learning algorithms are affected by learning rate and the underlying Markov chain. For both cases, upper and lower bounds for the mean square TD error are provided. The mean square TD error is shown to converge linearly to an exact limit.


Reviews: Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

Neural Information Processing Systems

Summary: - The main contribution of the paper is to write the TD update as a MJLS over an augmented parameter space with one parameter vector for each pair of states in the underlying MDP. - After presenting MJLS and the idea of the augmented parameter space, they first consider the IID case where pairs of states are chosen IID and give formulas for the expected error and its covariance. Under an additional ergodicity assumption they give a convergence rate to limiting quantities. For small learning rates (not exactly clear how small in terms of problem parameters) a perturbation analysis gives an estimate of what this convergence rate is (although the value of lambda_{max real} \bar A remains unclear in terms of the parameters of the problem). Pros: - The originality of the connection between TD dynamics and MJLS is a good contribution that could increase the flow of ideas from control theory to RL. In addition, the formulation of the augmented state space seems to be a potentially useful analysis tool.


Reviews: Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

Neural Information Processing Systems

This paper provides an analysis of Temporal-Difference algorithms using the theory of Markov Jump Linear Systems (MJLS). Its main contributions are to establish exact dynamics for the first and second order TD moments using linear function approximation, given by Linear Time Invariant systems. Reviewers found the technical contributions of this paper to be very strong, with potentially important significance in the study of a central object in RL such as TD learning. The main point of contention is the current presentation, which is cumbersome with notation, with page-long theorem statements, and, most importantly, without sufficient discussion of how these results relate to existing work on the convergence analysis of TD learning. However, after careful discussion with reviewers and having read the author feedback (which does promise to improve readability), considered that the positive contributions outweight the risk of poor readibility, and recommends acceptance, urging the authors to address the concerns raised by reviewers and AC.


Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

Neural Information Processing Systems

In this paper, we provide a unified analysis of temporal difference learning algorithms with linear function approximators by exploiting their connections to Markov jump linear systems (MJLS). We tailor the MJLS theory developed in the control community to characterize the exact behaviors of the first and second order moments of a large family of temporal difference learning algorithms. For both the IID and Markov noise cases, we show that the evolution of some augmented versions of the mean and covariance matrix of the TD estimation error exactly follows the trajectory of a deterministic linear time-invariant (LTI) dynamical system. Applying the well-known LTI system theory, we obtain closed-form expressions for the mean and covariance matrix of the TD estimation error at any time step. We provide a tight matrix spectral radius condition to guarantee the convergence of the covariance matrix of the TD estimation error, and perform a perturbation analysis to characterize the dependence of the TD behaviors on learning rate.